Hybrid Clustering Approach for Concept Generation
نویسنده
چکیده
Information retrieval is one of the major research areas due to accumulation of huge information in digital form. Various techniques of Information retrieval are based on the fact that various terms present in a document along with their frequency of occurrence signify the semantics of the document. Recent attempts to find the relevant document for a context represents documents in a Latent Semantic Indexing (LSI) model as document-term vector representing term weights for every index term in that document. As there will be enormous number of index terms this leads to high dimensionality problem. We can reduce the dimensionality based on the observation that groups of terms associated with related concepts occur together or do not occur in a document based on whether the document is relevant or not to that concept. Such a group of terms is identified as a Concept and can be viewed as a single dimension in a Rough set based information retrieval system. In this paper we present a hybrid clustering approach for the formation of equivalence classes of terms associated with related concepts. It uses the outcome of hierarchical clustering to provide seed points for implementing Incremental K-means algorithm. Due to the sparsity of the term vector, the cosine similarity estimate is found to be less effective for term clustering. Another promising measure of proximity estimate generally used in information retrieval is the Euclidian distance that it is biased towards changes in the term frequencies in larger documents when the term weights are represented by Term frequency-inverse document frequency (tf-idf) estimates. In this paper we propose a new term weight estimate namely term probability–inverse document frequency (tp-idf) for representing a term as a vector before clustering the terms.
منابع مشابه
Generating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms
UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...
متن کاملTabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach
The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...
متن کاملA Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection
Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...
متن کاملElectrofacies clustering and a hybrid intelligent based method for porosity and permeability prediction in the South Pars Gas Field, Persian Gulf
This paper proposes a two-step approach for characterizing the reservoir properties of the world’s largest non-associated gas reservoir. This approach integrates geological and petrophysical data and compares them with the field performance analysis to achieve a practical electrofacies clustering. Porosity and permeability prediction is done on the basis of linear functions, succeeding the elec...
متن کاملIntrusion Detection based on a Novel Hybrid Learning Approach
Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...
متن کاملA Hybrid Supervised-Unsupervised Vocabulary Generation Algorithm for Visual Concept Recognition
Vocabulary generation is the essential step in the bag-ofwords image representation for visual concept recognition, because its quality affects classification performance substantially. In this paper, we propose a hybrid method for visual word generation which combines unsupervised density-based clustering with the discriminative power of fast support vector machines. We aim at three goals: bre...
متن کامل